首页> 外文OA文献 >Learning to Count: Robust Estimates for Labeled Distances between Molecular Sequences
【2h】

Learning to Count: Robust Estimates for Labeled Distances between Molecular Sequences

机译:学习计数:分子序列间标记距离的可靠估计

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Researchers routinely estimate distances between molecular sequences using continuous-time Markov chain models. We present a new method, robust counting, that protects against the possibly severe bias arising from model misspecification. We achieve this robustness by generalizing the conventional distance estimation to incorporate the empirical distribution of site patterns found in the observed pairwise sequence alignment. Our flexible framework allows for computing distances based only on a subset of possible substitutions. From this, we show how to estimate labeled codon distances, such as expected numbers of synonymous or nonsynonymous substitutions. We present two simulation studies. The first compares the relative bias and variance of conventional and robust labeled nucleotide estimators. In the second simulation, we demonstrate that robust counting furnishes accurate synonymous and nonsynonymous distance estimates based only on easy-to-fit models of nucleotide substitution, bypassing the need for computationally expensive codon models. We conclude with three empirical examples. In the first two examples, we investigate the evolutionary dynamics of the influenza A hemagglutinin gene using labeled codon distances. In the final example, we demonstrate the advantages of using robust synonymous distances to alleviate the effect of convergent evolution on phylogenetic analysis of an HIV transmission network.
机译:研究人员通常使用连续时间马尔可夫链模型估算分子序列之间的距离。我们提出了一种新的方法,鲁棒计数,可以防止因模型错误指定而引起的严重偏差。我们通过概括常规距离估计并结合在观察到的成对序列比对中发现的位点模式的经验分布来实现这种鲁棒性。我们灵活的框架仅允许基于可能替代的子集计算距离。由此,我们展示了如何估计标记的密码子距离,例如同义或非同义替换的预期数量。我们提出了两个模拟研究。第一个比较了常规和鲁棒标记核苷酸估计量的相对偏差和方差。在第二个模拟中,我们证明了鲁棒计数仅基于易于拟合的核苷酸取代模型提供了准确的同义和非同义距离估计,从而避免了对计算昂贵的密码子模型的需求。我们以三个经验示例作为结束。在前两个示例中,我们使用标记的密码子距离研究了A型流感血凝素基因的进化动力学。在最后一个示例中,我们展示了使用健壮的同义词距离来减轻聚合进化对HIV传播网络系统发育分析的影响的优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号